Shot Boundary Detection and High-level Features Extraction for the TREC Video Evaluation 2003
نویسندگان
چکیده
The paper describes approaches to shot boundary detection and high-level features extraction from video that have been developed at the Accenture Technology Labs for the TREC Video Evaluation 2003. For shot boundary detection an approach which uses the chi-square test for the intensity histograms of adjacent I-frames has been applied. Of seventeen features that have been suggested for the TREC Video Evaluation, three features were selected: “People”, “Weather news” and “Female speech”. For detecting the “people” feature an approach that is based on multiple skin-tone face detection has been used. The “weather news” feature has been detected using a sequence of simple filters that pass only segments of proper length with specific color distribution and having video text in specific locations. For detecting “female speech” feature an approach that combines speaker gender recognition using fundamental frequency distributions, skin tone based face detection, and moving lips detection using optical flow has been implemented. 1. Shot Boundary Detection Task Segmenting video clips into continuous camera shots is the prerequisite step for many video processing and analysis applications. With the video compressed format MPEG dominating today, we developed a cut detection agent that works in compressed domain, i.e., it does not require the fully decompression of the video data, which significantly reduces the computation overhead. The cut detection is based on the method described in [1], which uses the Chi-square test for the three histograms (global intensity histogram, row intensity histogram and column intensity histogram) to evaluate the similarity between frames and find possible scene cuts. Despite the variety of methods proposed for shot boundary detection, using histogram comparison is the most common approach. However, it is observed that sometimes scene cuts occur without causing significant changes in global intensity histograms between consecutive frames. To address this problem, in [1] two additional histograms has been introduced, namely row (horizontal) and the column (vertical) histograms. And the three histograms are used to further distinguish two categories of scene changes, namely abrupt cut and gradual transition. As mentioned above, the algorithm works in compressed domain. In [1], only I-frames in the MPEG video are used to find the approximate location of the shot boundaries. Since I-frames are independently encoded and directly accessible from the MPEG data, using I-frames only can reduce the computations overhead. However as I-frame usually occurs every 12 or 15 frames, the algorithm [1] doesn’t give the exact frame number where the scene changes take place. We refined the algorithm as described below. I-frames are encoded in the same format as JPEG specification, which is based on Discrete Cosine Transform (DCT). To compute the three histograms, the first coefficients of the 8 x 8 DCT encoded blocks are used. These coefficients represent the average block intensities. As in [1], the row and the column histograms of an I-frame with MxN DCT blocks are defined as:
منابع مشابه
Automatic Shot Boundary Detection Using Adaptive Thresholds
This paper describes the contribution of the TZI to the shot detection task of the TREC 2003 video analysis track (TRECVID). The approach comprises a feature extraction step and a shot detection step. In the feature extraction, three features are extracted: a frequency-domain approach based on FFT-features, a spatial-domain approach based on changes in the image luminance values, and another sp...
متن کاملCLIPS at TREC 11: Experiments in Video Retrieval
This paper presents the systems used by CLIPS-IMAG to perform the Shot Boundary Detection (SBD) task, the Feature Extraction (FE) and the Search (S) task of the Video track of the TREC-11 conference. Results obtained for the TREC-11 evaluation are presented.
متن کاملTREC Video Retrieval Evaluation: A Case Study and Status Report
The TREC Video Retrieval Evaluation is a multiyear, international effort, funded by the US Advanced Research and Development Agency (ARDA) and the National Institute of Standards and Technology (NIST) to promote progress in content-based retrieval from digital video via open, metrics-based evaluation. Now beginning its fourth year, it aims over time to develop both a better understanding of how...
متن کاملSemantic Video Analysis
OVERVIEW The objective of this component is to index videos based on semantic mid to high-level features. To achieve this, the component integrates different modules for video processing. As shown in the diagram, the component integrates the following components in order to extract the embedded semantics from the video: shot boundary detection for categorising shots with similar attributes; key...
متن کاملThe TREC VIdeo Retrieval Evaluation (TRECVID): A Case Study and Status Report
The TREC Video Retrieval Evaluation (TRECVID) is an annual international effort, funded by the US Advanced Research and Development Activity (ARDA) and the National Institute of Standards and Technology (NIST) to promote progress in content-based retrieval from digital video via open, metrics-based evaluation. Now beginning its fourth year, TRECVID aims over time to develop both a better unders...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003